Effect of Reward Function Choices in MDPs with Value-at-Risk

نویسندگان

  • Shuai Ma
  • Jia Yuan Yu
چکیده

This paper studies Value-at-Risk problems in finite-horizon Markov decision processes (MDPs) with finite state space and two forms of reward function. Firstly we study the effect of reward function on two criteria in a short-horizon MDP. Secondly, for long-horizon MDPs, we estimate the total reward distribution in a finite-horizon Markov chain (MC) with the help of spectral theory and the central limit theorem, and present a transformation algorithm for the MCs with a three-argument reward function and a salvage reward.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dopamine Reward Prediction Error Responses Reflect Marginal Utility

BACKGROUND Optimal choices require an accurate neuronal representation of economic value. In economics, utility functions are mathematical representations of subjective value that can be constructed from choices under risk. Utility usually exhibits a nonlinear relationship to physical reward value that corresponds to risk attitudes and reflects the increasing or decreasing marginal utility obta...

متن کامل

Probabilistic Planning with Risk-Sensitive Criterion

Probabilistic planning models and, in particular, Markov Decision Processes (MDPs), Partially Observable Markov Decision Processes (POMDPs) and Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) have been extensively used by AI and Decision Theoretic communities for planning under uncertainty. Typically, the solvers for probabilistic planning models find policies that min...

متن کامل

2 Finite State and Action Mdps

In this chapter we study Markov decision processes (MDPs) with nite state and action spaces. This is the classical theory developed since the end of the fties. We consider nite and in nite horizon models. For the nite horizon model the utility function of the total expected reward is commonly used. For the in nite horizon the utility function is less obvious. We consider several criteria: total...

متن کامل

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games

In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent’s score; the “true” reward of a win, loss, or tie is determined at the end of a game by applying a threshold function to the cumulative intermediate re...

متن کامل

Heuristic Search for Generalized Stochastic Shortest Path MDPs

Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V ∗ that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V ∗. This paper’s main contribution is the description o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016